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Abstract 

We present a structural data set of the 20 proteinogenic amino acids 
and their amino-methylated and acetylated (capped) dipeptides. Differ¬ 
ent protonation states of the backbone (uncharged and zwitterionic) were 
considered for the amino acids as well as varied side chain protonation 
states. Furthermore, we studied amino acids and dipeptides in complex 
with divalent cations (Ca^"*", Ba^"*", Sr^"*", Cd^"*", Pb^"*", and Hg^^). The 
database covers the conformational hierarchies of 280 systems in a wide 
relative energy range of up to 4 eV (390 kj/mol), summing up to an overall 
of 45,892 stationary points on the respective potential-energy surfaces. All 
systems were calculated on equal first-principles footing, applying density- 
functional theory in the generalized gradient approximation corrected for 
long-range van der Waals interactions. We show good agreement to avail¬ 
able experimental data for gas-phase ion affinities. Our curated data can 
be utilized, for example, for a wide comparison across chemical space of 
the building blocks of life, for the parametrization of protein force fields, 
and for the calculation of reference spectra for biophysical applications. 
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Background & Summary 


Proteins are the machinery of life. We here present a first-principles stndy of the 
conformational preferences of their basic building blocks - specifically, as sum¬ 
marized in Figurej^ 20 proteinogenic amino acids and dipeptides, with different 
possible protonation states, and the conformational space changes resulting from 
attaching six divalent cations, i.e., Ca^’*', Ba^'*', Sr^+, Cd^+, Pb^+, and Hg^+. In 
past studies, a wide range of different approximate electronic structure methods 
has been applied to some of these proteinogenic amino acids ~ see, for example, 
references jl-59 . These studies have deepened our understanding of the con¬ 


formational basics of individual building blocks, but a systematic comparison of 
properties of the different building blocks is complicated when relying on data 
from different sources. On the one hand this is due to the molecular models 
that may differ in protonation states and backbone capping. On the other, the 
simulations can differ in several ways: 


• Different sampling strategies or methods to generate conformers may have 
been used. Search-dependent settings, like energy cut-offs, can also have 
a significant impact on the results. 

• The levels of theory that have been applied range from semi-empirical 
to Hartree-Fock (HF) to density-functional theory (DFT) up to coupled- 
cluster calculations [T- 

• Numerical settings, e.g., basis sets, can differ substantially and might lead 
to different results. 
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A further point that limits a quantitative comparison is the accessibility of 
the data from different studies. Energies, for example, often have to be extracted 
from table footnotes and/or the structural data is not always accessible in the 
Supporting Information of the respective articles, sometimes even only accessible 
as figures in the manuscript. The data set presented here overcomes such lim¬ 
itations by covering a comprehensive segment of chemical space exhaustively, 
using a large scale computational effort. This study treats 20 proteinogenic 
amino acids, their dipeptides and their interactions with the divalent cations 
Ca^'*', Ba^+, Sr^+, Cd^+, Pb^+, and Hg^+ (see Figure for an overview) on the 
same theoretical footing. The importance of peptide cation interactions may 
be highlighted by the fact that about 40% of all proteins bind cations [60 -62 


Especially Ca^"*" is important in a multitude of functions, ranging, for example, 
from blood clotting [^ to cell signaling to bone growth |^. Such calcium me¬ 
diated functions can be disturbed by the presence of alternative divalent heavy 
metal cations like Pb^+, Cd^+, and Hg^+ 

The conformations and total energies of each molecular system are calculated 
from first principles in the framework of density-functional theory (DFT) |67[^ 
using the PBE generalized-gradient exchange-correlation functional |69| . Ener¬ 
gies are corrected for van der Waals interactions using the Tkatchenko-Scheffler 
In this formalism, pairwise C6[n]/r® terms are computed and 


formalism 
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summed up for all pairs of atoms, r is the interatomic distance, a cut-off for 
short interatomic distances is applied, and C6[n] coefficients are obtained from 
the self-consistent electron density. The combined approach is referred to as 
“PBE+vdW” throughout this work. This level of theory is robust for potential- 
energy surface (PES) sampling of peptide systems ■ 78 . The curated data is 
provided as basis for comparative studies across chemical space to reveal con¬ 
formational trends and energetic preferences. It can, for example, further be 
used for force-field development, theoretical studies at higher levels of theory, 
and as a starting point for theoretical calculations of spectra for biophysical 
applications. 


Methods 

Molecular models 

This study covers a total of 280 molecular systems (summarized in Figure [^. 
The number is the product of these chemical degrees of freedom that were 
considered in our study: 

20 proteinogenic amino acids. In case of (de)protonatable side chains, all pro- 
tomers (different protonations states) were considered as well. 

2 different backbone types, either free termini (considered in uncharged or zwit- 
terionic form) or capped (N-terminally acetylated or C-terminally amino- 
methylated) . 

7 reflecting that the respective amino acid or dipeptide was considered either in 
isolation or with one of six different cation additions: Ca^+, Ba^'*', Sr^+, 
Cd2+, Pb2+, or Hg2+. 

Conformational search and energy functions 

For the initial scan of the PES, the empirical force field OPLS-AA was 
employed, followed by DFT-PBE+vdW relaxations of the energy minima iden¬ 
tified in the force field. The identified set of structures was then subjected to 
a further first-principles refinement step, ab initio replica-exchange molecular 
dynamics (REMD). An overview of the procedure is given in Figure]^ and the 
steps are described in more detail below. 

Force-field based (OPLS-AA) global conformational searches (Step 1) 
were performed for all dipeptides and amino acids (i) without a coordinating 
cation and (ii) with Ca^+. These searches employed a basin hopping search 
strategy [M1[M] as implemented in the tool “scan”, distributed with the Tin¬ 
ker molecular simulation package |8^[83] . We here use an in-house parallelized 
version of the Tinker scan utility that was first used in reference [^. In this 
search strategy, input structures for relaxations are generated by projecting 
along normal modes starting from a local minimum. The number of search di¬ 
rections from a local minimum was set to 20. Conformers were accepted within 
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a relative energy window of lOOkcal/mol and if they differ in energy from al¬ 
ready found minima by at least 10“"^ kcal/mol. The search terminates when the 
relaxations of input structures do not result in new minima. 

After that, PBE+vdW relaxations (Step 2) were performed with the 
program FHI-aims |84}|86| . FHI-aims employs numeric atom-centered orbital 
basis sets as described in reference to discretize the Kohn-Sham orbitals. 
Different levels of computational defaults are available, distinguished by choice 
of the basis set, integration grids, and the order of the multipole expansion of 
the electrostatic (Hartree) potential of the electron density. For the chemical 
elements relevant to this work, “light” settings include the so-called tierl basis 
sets and were used for initial relaxations. “Tight” settings include the larger 
tier2 basis sets and ensure converged conformational energy differences at a 
level of few meV [^. Unless noted otherwise, all energies discussed here are 
results of PBE+vdW calculations with a tier2 basis and “tight” settings. Rela¬ 
tivistic effects were taken into account by the so-called atomic zero-order regular 
approximation (atomic ZORA) 88 as described in reference [^. Previous 
comparisons to high-level quantum chemistry benchmark calculations at the 
coupled-cluster level, CCSD(T), demonstrated the reliability of this approach 
for polyalanine systems [72[|76| , alanine, phenylalanine, and glycine containing 
and alanine dipeptides with Li'*' 


tripeptides 
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Further benchmarks at 


the MP2 level of theory are reported below in the section Technical Validation. 

The refinement (Step 3) by ab initio REMD is intended to allevi¬ 

ate the potential effects of conformational energy landscape differences between 
the force field and the DFT method. In REMD, multiple molecular dynamics 
trajectories of the same system are independently initialized and run in a range 
of different temperatures. Based on a Metropolis criterion, configurations are 
swapped between trajectories of neighboring temperatures. Thus, the simula¬ 
tions can overcome barriers and provide an enhanced conformational sampling 
in comparison to classical molecular dynamics (MD) (M|[M) . The simulations 
were carried out employing a script-based REMD scheme that is provided with 
FHI-aims and that was first used in reference j^. Computations were per¬ 
formed at the PBE+vdW level with “light” computational settings. The run 
time for each REMD simulation was 20 ps with an integration time step of 
1 fs. The frequent exchange attempts (every 0.04 or 0.1 ps) ensure efficient sam¬ 
pling of the potential-energy surface as shown by Sindhikara et al. [^. The 
velocity-rescaling approach by Bussi et al. was used to sample the canonical 
distribution. Starting geometries for the replicas were taken from the lowest 
energy conformers resulting from the PBE+vdW relaxations in Step 2. REMD 
parameters for the individual systems, i.e. the number of replicas, acceptance 
rates for exchanges between replicas, the frequency for exchange attempts, and 
the temperature range, are summarized in table SI of the Supporting Mate¬ 
rial. Conformations were extracted from the REMD trajectories every 10th 
step, i.e. every 10 fs of simulation time. In order to generate a set of represen¬ 
tative conformers, these structures were clustered using a fc-means clustering 
with a cluster radius of 0.3 A as provided by the MMSTB pack- 
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algorithm 

age (^. The resulting arithmetic-mean structures from each cluster were then 
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relaxed using PBE+vdW with “light” computational settings. The obtained 
conformers were again clustered and cluster representatives were relaxed with 
PBE+vdW (“tight” computational settings) to obtain the final conformation 
hierarchies. The refinement step by REMD is essential, as shown in Figure 
which separately identifies the number of distinct conformers found in Step 2 
and, subsequently, the number of additional conformers found in Step 3. 

After step 2, a total of 17,381 stationary points was found for the amino acids 
and dipeptides in isolation and in complex with Ca^“'". The refinement procedure 
in Step 3 increases this number to a total of 21,259 structures. Initial structures 
for the Ba^’*', Cd^+, Hg^+, Pb^+ and Sr^+ binding amino acid and dipeptide 
systems were then obtained by replacing the Ca^"*" cation in the amino acid and 
dipeptide structures binding a Ca^’*' cation. These structures were subsequently 
relaxed with PBE+vdW employing “tight” computational settings and a tier- 
2 basis set. This procedure results in 24,633 further conformers with bound 
cations. Altogether, we thus provide information on 45,892 stationary points of 
the PBE+vdW PES for all systems studied in this work. 

The numbers of conformers identified in the searches are also given in Table 
S2 of the Supporting Material. Tables S3 and S4 provide detailed accounts of 
how many structures were found for which amino acid/dipeptide in isolation or 
with attached cations. 


Data Records 

The curated data, consisting of the Cartesian coordinates of 45,892 stationary 
point geometries of the PBE+vdW PES (the main outcome of our work) and 
their potential energies computed at the “tight”/tier-2 level of accuracy in the 
FHI-aims code, is provided as plain text files sorted in directories (see Figure]^. 
The PBE+vdW total energies are included since they are an integral part of 
the construction of our geometry data sets. Importantly, the stationary point 
geometries could be used as starting points to refine the total energy accuracy 
by higher-level methods, e.g., those discussed in “Technical Validation” below. 
The folder structure is hierarchic and straightforward. The naming scheme is 
explained in the following: 

Description of the file types: 

conformer.(...).xyz coordinates in standard xyz format in A, readable by a 
wide range of molecule viewers, e.g. VMD, Jmol, etc. 

conformer.(...).fhiaims coordinate file in FHI-aims geometry input format: 
for each atom of the particular system, the Cartesian coordinates are given 
in A (atom [x] [y] [z] [element]). The electronic total energy (in eV) 
at the PBE+vdW level is given there as a comment. 

control.in FHI-aims input file with technical parameters for the calculations. 
Please note that these files also include the exact specifications of the 
“tight” numerical settings for all included elements. 
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hierarchy _PBE+vdW_ tier-2.dat in each final subfolder, contains three 
columns: number of the conformer, total energy (in eV, PBE+vdW, tier- 
2 basis set, “tight” numerical settings, computed with FHI-aims version 
031011), and relative energy (in eV, relative to the respective global min¬ 
imum) . 

The curated data is publicly available from several sources: 

1. A website dedicated to this data set has been set upQ and allows users 
to browse and download the data and to visualize molecular structures 
online. 

2. From the NOMAD repositorj0the data is available via the DOI 10.17172/NO- 
MAD/2015052622050^ [Data citation 1]. 

3. In addition, the data has been uploaded to DRYAEj^and has been assigned 
the DOI 10.5061/dryad. vdl77[^ [Data citation 2[. 


Technical Validation 


The conformational coverage for the amino acid alanine is validated by com¬ 
paring to a recent study by Maul et al. . In that reference, 10 low energy 
conformers of alanine were reported, spanning an energy range of approximately 
0.26 eV between the reported lowest and highest energy conformers. The level of 
theory used by Maul et al. was DFT in the generalized gradient approximation 
by means of the Perdew-Wang 1991 functional [^. In our case, the force field 
based search step with subsequent PBE+vdW relaxations yields 5 conformers. 
The following ab initio REMD simulations increase the number of conformers to 
15 within an energy range of 0.43 eV. The respective conformational energy hier¬ 
archies after global search and after REMD-refinement are shown in Figure [^. 
The results of our search (with the refinement step) are in good agreement with 
the data from reference that is also shown in Figure Structures are 
shown in Figure |^. Nine of the ten conformers identified by Maul et al. can 
be confirmed. The single conformer that is missing (highlighted by an X in 
Figure [^) is not a stationary point of the PBE+vdW potential energy sur¬ 
face. Conformers 14 and 15 are classified as saddle points by analysis of the 
vibrational modes. 

In order to further quantify the reliability of the DFT-PBE+vdW level of 
theory for peptides, beyond earlier benchmark work |72| 73 76 and especially 


with divalent cations, benchmark calculations were performed at the level of 
Mpller-Plesset second-order perturbation theory (MP2) [M 99 using the elec¬ 
tronic structure program package ORCA [100 . Single-point energy calculations 


^http: //aminoaciddb.rz-berlin.mpg.de 
^http: //nomad-repository.eu 

3http://dx.doi.org/10.17172/NOMAD/20150526220502 
^https: //datadryad.org 
® http://dx.doi.org/10.5061/dryad.vdl77 
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were performed for all fixed stationary-point DFT-PBE+vdW geometries in our 
data base for the amino acids alanine (Ala) and phenylalanine (Phe) with neu¬ 
tral N and C termini in isolation as well as in complex with a Ca^’*' cation. 
Phe was selected to represent a “difficult” example, i.e., the interaction of the 
cation with a larger aromatic side chain. The MP2 calculations did not include 
any frozen-core treatment (including semicore states is essential for Ca^"*') and 
were performed using Dunning’s correlation-consistent polarized core-valence 
basis sets (cc-pCVnZ), with n=T/Q/5 denoting the triple-zeta, quadruple-zeta, 
and quintuple-zeta basis sets respectively 101 . The calculated SCF (Hartree- 


Fock) and MP2 correlation energies were then individually extrapolated to the 
complete basis set (CBS) limit as follows: For SCF energies, we used the ex¬ 
trapolation strategy proposed by Karton and Martin |102|: 


TT'n _ t^CBS I Qi-v/n 

^SCF - ^SCF + ■ 


( 1 ) 


A, a, and the CBS-extrapolated energy Eg^p are parameters determined from 
a least-squares fitting algorithm applied individually for each conformer. For the 
MP2 correlation energies, an extrapolation scheme proposed by Truhlar |103| 
was applied: 


+ Bn- 


( 2 ) 


Again, B, f3, and the CBS-extrapolated energy E^J^f are parameters deter¬ 
mined from a least-squares fitting algorithm as before. A detailed account of all 
numbers is given in the Supporting Material (Table S5). Mean absolute errors 
between the density-functional approximation (DFA) relative energies and the 
basis-set extrapolated MP2 relative energies were calculated as follows: 

’ ( 3 ) 

i=l 

where the index i runs over all N conformations of a given data set. /S.Ei in prin¬ 
ciple denotes the energy difference between conformer i and the lowest-energy 
conformer of the set. The adjustable parameter c is used to shift the MP2 and 
DFA conformational hierarchies versus one another to obtain the lowest possi¬ 
ble MAE, rendering the reported MAE value independent of the choice of any 
reference structure. Figure]^ shows the corresponding obtained mean absolute 
errors (MAE) and maximal errors {maXi\/S.E^^^—/^Ef^^‘^+c\) of different DFA 
calculations - performed with the FHI-aims code - with respect to benchmarks 
on the MP2 level obtained as described above. Within FHI-aims, the accuracy of 
integration grids and of the electrostatic potential was also verified by comparing 
“tight” and “really_tight” numerical settings, giving virtually identical results. 
The DFA level of theory of PBE+vdW shows a MAE well within chemical ac¬ 
curacy of ^ Ikcal/mol « 43meV for both structural sets of Ala and Phe; for 
Phe, the maximal error is ~ 2kcal/mol. We next applied a different long-range 
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dispersion treatment, a recent many-body dispersion model based on interacting 
quantum harmonic oscillators denoted as MBD, [IM] showing no significant im¬ 
provement for the isolated amino acids. In line with Ref. |76| , applying the more 
expensive PBEO 105 hybrid exchange correlation functional reduces the max¬ 
imum deviation for Phe to ^ 57meV, i.e., 1.3kcal/mol. For Ala and Phe with 
neutral end caps in complex with a Ca^“'" cation, Figure]^ compares the same 
set of DFAs to MP2 benchmark energy hierarchies. However, obtaining basis- 
set converged total energies of the same accuracy as for the isolated peptides 
by straightforward CBS extrapolation proved remarkably more difficult when 
Ca^'*' was involved. The reason is traced to the significant and slow-converging 
correlation contribution of the Ca^’*' semicore electrons, which leads to large 
and conformation dependent basis set superposition errors (BSSE). This prob¬ 
lem was verified for MP2 calculations in the FHI-aims and ORCA codes, with 
several different basis set prescriptions |106| , and for CCSD(T) calculations. 
Standard DFAs, if sufficiently accurate, have a significant advantage in this 
respect since they are not subject to comparable numerical convergence prob¬ 
lems. To yet arrive at reliable CBS-extrapolated MP2 conformational energy 
differences, we thus subjected the SCF and correlation energies of each Ca^+ 
coordinated conformation to a counterpoise correction fl07l[T08] to minimize the 
effect of BSSE on the Ca^+ correlation energy contribution, prior to perform¬ 
ing CBS extrapolation as described above. For the example of Ala+Ca^"*" and 
assuming rigid conformers, the BSSE is estimated as: 


Ebsse =EBSSE{Ala) + Ebsse^Co?^) , with 

EBSSE{Ala) = - E^^‘^{Ala) , and (4) 

EBSSE{Ca^+) = - F^“'^(C'a2+). 

j^Aia+Ca + represents the energy of Ala evaluated in the union of the 
basis sets on Ala and Ca^+, E^’'‘^{Ala) represents the energy of Ala evaluated 
in the basis set on Ala, etc. The individual BSSE errors are then subtracted 
from the SCF and correlation energy respectively. Phe+Ca^"*" is treated equiv¬ 
alently. Complete numerical details are given in the Supplementary Material 
(Table S6). Following this procedure, the MAE and maximal error values of 
various DFAs compared to MP2 are well within Ikcal/mol for Ala+Ca^'*'. The 
PBE+vdW MAE for Phe+Ca^’*' amounts to just above ~ 2kcal/mol. The 
contributions from both the many-body dispersion and the hybrid PBEO func¬ 
tional improve the MAE to just above Ikcal/mol at to PBEO+MBD* level of 
theory. The maximum errors in the energy hierarchies between individual con- 
formers are correspondingly larger. Overall, this assessment shows that our 
data base of conformer geometries constitutes, e.g., an excellent starting point 
for more exhaustive future benchmark work of new electronic structure methods 
for cation-peptide systems. For example, it would be very interesting to explore 
how F12 approaches, which address the correlation energy convergence problem 
explicitly, fare for a broad range of different Ca^’*' containing conformations of 
our peptides. 












As a final validation, we compare the correlation of calculated gas-phase 
amino acid-Ca^"*" binding energies to the binding energy hierarchy found exper¬ 
imentally in a study by Ho et al. (lo^ . We calculate binding energy at the PES 
level as 

^binding ^amino acid ^cation ^complex- (5) 

Energies E denote the PBE+vdW Born-Oppenheimer potential energies, in¬ 
cluding Eamino acid of the lowest-energy conformers of the isolated amino acid 
and Ecompiex of the same amino acid in complex with a Ca^“'" ion. Experi¬ 


mentally 109 , the gas-phase Ca^+ affinities of 18 proteinogenic amino acids 


were determined by fragmenting Ca^“'" complexes with a combinatoric library 
of tripeptides at T wSSO K, recording the mass spectrometric peak intensities 
of different fragmentation products. Quantitative average relative binding en¬ 
ergies of to different amino acids were thus inferred and can be compared 

to our findings, albeit with several important experiment-theory differences: 
(i) Entropy effects (ZU [7^|110| should affect the specific complexes probed ex¬ 
perimentally but cannot be included into the calculated numbers in the exact 
same way, (ii) structural differences (e.g., protonation, dimerized amino acids) 
between the fragments recorded in experiment and the amino acids covered 
here, (iii) experimental affinities are not given for Asp and Glu because 

their gas-phase acidities, needed for data conversion, are not known. Figure 
compares the experimentally and theoretically inferred Ca^'^ binding affinities 
qualitatively. The x-axis reflects the experimental binding affinity energy hier¬ 
archy, arranging amino acids from left to right in order of decreasing binding 
affinity. The y axis shows calculated binding energies according to Eq. Per¬ 
fect correlation of the experimental and calculated hierarchies would imply a 
strictly monotonic decrease of calculated Ebinding values from left to right. This 
monotonic trend is not obeyed exactly; however, in view of the significant differ¬ 
ences (i) and (ii) above, the qualitative agreement is quite striking. Normalized 
correlation coefficients between the experimental (1) and calculated (2) binding 
affinity data were calculated following the formula: 


^12 — S12/(S1S2), 


( 6 ) 


with S 12 being the covariance of data sets and Si being the standard deviations 
of data sets i=l,2. The result, correlation coefficients of ri2=0.979 or 0.909 
for uncapped amino acids or dipeptides, respectively, also point to an overall 
remarkably good agreement. Finally, Figure [7] also gives predicted Ebinding val¬ 
ues for protonated (overall system charge +2) and deprotonated (overall system 
charge +1) Asp and Glu, reflecting the significant electrostatic attraction be¬ 
tween cations and negatively charged (deprotonated) Asp and Glu side chains. 
The binding energy data sets are included as Supplementary Table S5. 


Usage Notes 

The present data contains stationary-point geometries (mainly minima, but also 
saddle points since no routine normal-mode analysis was performed) on the po- 
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tential energy surface of the 20 proteinogenic amino acids and dipeptides, either 
isolated or in complex with a divalent cation (Ca^'*', Ba^~ *~, Sr^+, Cd^+, 

Hg^+). The users of this dataset may find openbabel 111 (www.openbabel.org) 


to be a useful tool to convert FHI-aims and xyz files to other common file 
formats in chemistry. 
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Amino acid Divalent cations 

uncharged zwitterion dipeptide 



Side chain @ 


Histidine 



H H 

HisD HisE HisH 


Methionine 

(Met) ^ 


Lysine Proline 

(Lys) (Pro) 




Threonine 

(Thr) 


''h.QH 

Cysteine 

(Cys) 

^^SH 

Glutamic acid OH 0 

Glycine 

(Gly) 

—H 







GluH Glu 



Tryptophan 

(Trp) 

H 

N 

Y 


Serine 

(Ser) 

^^OH 

Asparagine 

(Asn) 

NH, 

Alanine 

(Ala) 

—CH3 

Arginine |_j 
(ArgK^^N^NH 

1 NH, NH, 

Aspartic acid 

OH 0 

(Asp) , 

^0 

Glutamine 

(Gin) 

NH, 

Valine 

(Val) 

_/CH3 

^CHs 


Arg 

ArgH 


AspH Asp 





Isoleucine 

(Me) 


/—CH3 

Vh3 

Leucine 

(Leu) 

/CH3 

^CH3 

Phenylalanine 

(Phe) 


Tyrosine 

(Tyr) 



Figure 1: Molecular systems covered in this study. Top left and center: 
Schematic depiction of the backbone conformations of uncharged, zwitterionic, 
and dipeptide forms of the aminoacids considered in this work. Side chains are 
indicated by the letter R. Top right: Divalent ions considered for complexation 
with the 20 proteinogenic amino acids. Lower five rows: Side chains, including 
different protonation states where applicable, of the 20 proteinogenic amino 
acids considered in this work. 
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other cations by analogy: 


Sr^+ Cd^+ LI Hg 




Ba^+ 



Figure 2: Schematic representation of the workflow employed to locate 
stationary points on the potential-energy surfaces of the respective molecular 
systems. 
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acids Conformers identified in step 



Pro" 

Gly' 


step 

2 

3 

1 Ca2+ 

480 

4 

1 bare 

1119 

99 

Ca2+ 

104 

84 

1 bare 

352 

381 

Ca2+ 

50 

2 

bare 

222 

23 

Ca2+ 

32 

103 

bare 

,161 

54 

Ca2+ 

79 

23 

bare 

143 

54 

Ca2+ 

87 

21 

bare 

124 

122 

Ca2- 

52 

0 

bare 

69 

65 

Ca2+ 

! 30 

7 

bare 

J 49 

27 

Ca2+ 

79 

20 

bare 

45 

31 

Ca2+ 

' 30 

8 

bare 

} 44 

41 

Ca2+ 

26 

32 

bare 

3/ 

20 

Ca2+ 

; 33 

28 

bare 

1 35 

22 

Ca2< 

17 

11 

bare 

33 

14 

Ca2+ 

18 

5 

bare 

1 31 

68 

Ca2+ 

18 

10 

bare 

29 

30 

Ca2+ 

39 

8 

bare 

1 31' 

26 

Ca2+ 

19 

13 

bare 

26 

21 

Ca2+ 

26 

12 

bare 

1 33 

20 

Ca2+ 

17 

12 

bare 

21 

28 

Ca2+ 

19 

8 

bare 

1 20 

16 

Ca2+ 

12 

4 

bare 

1/ 

5 

Ca2+ 

8 

11 

bare 

15 

10 

Ca2+ 

23 

5 

bare 

11 

12 

Ca2+ 

3 

2 

bare 

1 5 

10 

Ca2+ 

2 

4 

bare 

3 

19 

Ca2+ 

7 

2 

bare 

1 3 
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b) Dipeptides 


Conformers identified in step 


LysJ 
LysH J 
Argh! 

GIuhJ 

Met! 

GInJ 

lie" 

AspH 

Leu 

HisD 

Trp 

Thr 
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Figure 3: Numbers of stationary points of the PBE+vdW potential-energy 
surface (PES) at the “tight”/tier-2 level of accuracy that were found for the dif¬ 
ferent a) uncapped amino acids or b) dipeptides in isolation (“bare”) or with a 
Ca^+ cation. Blue segments of the bars and blue shaded numbers give the num¬ 
ber of stationary points (“conformers”) located in Step 2 of the search procedure 
detailed in Figure Red bar segments and red shading highlight the number of 
conformers that were additionally found during Step 3 of the search. The total 
number of conformers found for each system is the sum of the numbers found 
in steps two and steps three. 
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Figure 4: Schematic representation of folder organization of the data. 

Each folder, as exemplified for the Ca^+-coordinated cysteine dipeptide, con¬ 
tains coordinate files in two formats (standard XYZ and FHI-aims input), the 
computational settings file for FHI-aims (control.in), and the energy hierarchies 
(PBE+vdW, “tight”/tier-2 level) per system. 
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Figure 5: Comparison of search strategies. (a)) The conformational 
energy hierarchies for alanine after the global search and the local refinement 
together with the reference hierarchy at the DFT-PW91 level that was published 
by Maul et al. . Conformers indicated by black lines were found in the global 
search, the conformers in red were located only after the local refinement step. 
The blue line in the reference conformational hierarchy represents a minimum 
not found in our search and not present at the PBE+vdW level, (b)) Confor¬ 
mations of the alanine molecule. Conformers marked with an asterisk (*) were 
found in the local refinement step of our search strategy. Atoms are color-coded 
as follows: Cyan (C), blue (N), red (O), white (H). The conformer labeled with 
X was found by Maul et al. in PW91 calculations but is unstable at the 
PBE+vdW level. 
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Figure 6: Comparison of different DFAs to MP2 energies. Mean absolute 
error (MAE) and maximal error (in meV) between different relative energies 
at the DFA (PBE+vdW, PBE+MBD*, and PBEO+MBD*) and MP2 level of 
theory, using structures of obtained minima on the PBE+vdW level from the 
database for the systems of Ala and Phe with neutral end caps, both in isolation 
and in complex with a Ca^“'" cation. Computational details are given in the text. 
Exact numbers are summarized in Table [T] 
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Figure 7: Comparison of the gas-phase binding energies of Ca^+ to 
different amino acids calculated in this work {y axis) to the experimentally 
inferred hierarchy of gas-phase binding energies of Ca^“'" to different amino acids 
by Ho et al. |109| The amino acids are ordered along the x axis from the highest 
to lowest experimental Ca^“'" binding energy. Protonated and deprotonated Asp 
and Glu are not included among the experimental data and are here shown as 
predictions, ifbinding is high for deprotonated Asp and Glu since these forms of 
the amino acid would carry a negative charge. 
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Table 1: Mean absolute error (MAE) and maximal error (in meV; in parenthe¬ 
ses: in kcal/mol) between different relative energies at the DFA (PBE+vdW, 
PBE+MBD*, and PBEO+MBD*) and MP2 level of theory, using structures of 
obtained minima on the PBE+vdW level from the database for the systems of 
Ala and Phe with neutral end caps, both in isolation and in complex with a 
Ca^+ cation. Computational details are given in the text. 


System 

MAE [meV] 

Maximal error [meV] 


PBE+vdW 

24 (0.5) 

44 (1.0) 

Ala 

PBE+MBD* 

23 (0.5) 

44 (1.0) 


PBEO+MBD* 

13 (0.3) 

28 (0.6) 


PBE+vdW 

25 (0.6) 

78 (1.8) 

Phe 

PBE+MBD* 

26 (0.6) 

77 (1.8) 


PBEO+MBD* 

16 (0.4) 

57 (1.3) 


PBE+vdW 

17 (0.4) 

23 (0.5) 

Ala+Ca2+ 

PBE+MBD* 

15 (0.3) 

22 (0.5) 


PBEO+MBD* 

9 (0.2) 

15 (0.3) 


PBE+vdW 

105 (2.4) 

225 (5.2) 

Phe+Ca2+ 

PBE+MBD* 

61 (1.4) 

146 (3.4) 


PBEO+MBD* 

50 (1.2) 

104 (2.4) 


Tables 
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Further tables are provided in a Microsoft Excel file and as tab-delimited text 

files as Supporting Information to this article: 

Table SI Parameters specific to the REMD simulations of the different sys¬ 
tems: the number of Replicas, the probability of Acceptance as well as 
the Time between exchange attempts, and the Temperature range of the 
replicas. 

Table S2 Number of conformers found in the different stages (after global 
search and after refinement) of the search scheme for amino acids, dipep¬ 
tides, and complexes thereof with Ca^’*' cations. For the amino acids, the 
basin hopping search was performed starting from the non-zwitterionic as 
well as from the zwitterionic state. These numbers are separated by a “-I-” 
in the respective column. 

Table S3 Numbers of conformers found for the amino acids (AA) and their 
complexes with the investigated divalent cations. 

Table S4 Numbers of conformers found for the dipeptides (Dip.) and their 
complexes with the investigated divalent cations. 

Table S5a Extrapolation of SCF energies as proposed by Karton and Mar¬ 
tin: E'^cF = EsCF + ^ * g{-alpha*^) ^ ^ 3 ^ 4 ^ 5 . ^CBS 

to be determined by a least squares fit; perfect fit as ^parameters = 
^datapoints = 3; all values in eV. 

Table S5b Extrapolation of MP2 correlation energies as proposed by Truhlar: 
Eaorr = Egff + B * with u = 3,4, 5; B, beta, to be deter¬ 

mined by a least squares fit; perfect fit as perfect fit as ^parameters = 
=ffdatapoints = 3; all values in eV. 

Table S6 Basis set superposition errors (BSSE) for SCF and MP2 correlation 
energies with n = TjQjh', all values in eV. 

Table S7 Relative gas-phase Ca^+ binding energies for the amino acids from 
experiments by Ho et al. |109| and absolute binding energies in the gas 
phase from DFT-PBE+vdW calculations for amino acids and dipeptides. 
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